benchmark task
Simple random search of static linear policies is competitive for reinforcement learning
Model-free reinforcement learning aims to offer off-the-shelf solutions for controlling dynamical systems without requiring models of the system dynamics. We introduce a model-free random search algorithm for training static, linear policies for continuous control problems. Common evaluation methodology shows that our method matches state-of-the-art sample efficiency on the benchmark MuJoCo locomotion tasks. Nonetheless, more rigorous evaluation reveals that the assessment of performance on these benchmarks is optimistic. We evaluate the performance of our method over hundreds of random seeds and many different hyperparameter configurations for each benchmark task. This extensive evaluation is possible because of the small computational footprint of our method. Our simulations highlight a high variability in performance in these benchmark tasks, indicating that commonly used estimations of sample efficiency do not adequately evaluate the performance of RL algorithms. Our results stress the need for new baselines, benchmarks and evaluation methodology for RL algorithms.
Appendix A Additional results This appendix section shows additional results and corresponding plots to support the insights
Section A.2 shows results using a chat-style verbalized numeric Section A.3 shows results on four extra benchmark tasks made available with Finally, Section A.5 presents and discusses results on feature In this section, we evaluate risk score calibration on the income prediction task across different subpopulations, such as typically done as part of a fairness audit. Figures A1-A2 show group-conditional calibration curves for all models on the ACSIncome task, evaluated on three subgroups specified by the race attribute in the ACS data. We show the three race categories with largest representation. The'Mixtral 8x22B' and'Yi 34B' models shown are the worst offenders, where samples belonging to the'Black' population see consistently lower scores for the same positive label probability when compared to the'Asian' or'White' populations. On average, the'Mixtral 8x22B (it)' model classifies a Black individual with a In fact, this score bias can be reversed for some base models, overestimating scores from Black individuals compared with other subgroups.
- Oceania > New Zealand (0.04)
- North America > United States > California (0.04)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- North America > United States > California (0.04)
- (6 more...)
- Research Report > New Finding (0.92)
- Questionnaire & Opinion Survey (0.68)
- Government (0.92)
- Education (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.98)
- (2 more...)
Sourcerer: Sample-based Maximum Entropy Source Distribution Estimation Julius V etter,1,2, Guy Moss
Scientific modeling applications often require estimating a distribution of parameters consistent with a dataset of observations--an inference task also known as source distribution estimation. This problem can be ill-posed, however, since many different source distributions might produce the same distribution of data-consistent simulations. To make a principled choice among many equally valid sources, we propose an approach which targets the maximum entropy distribution, i.e., prioritizes retaining as much uncertainty as possible.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- Oceania > Australia (0.04)
- Government (0.93)
- Health & Medicine > Therapeutic Area (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.61)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization
Designing and deriving effective model-based reinforcement learning (MBRL) algorithms with a performance improvement guarantee is challenging, mainly attributed to the high coupling between model learning and policy optimization. Many prior methods that rely on return discrepancy to guide model learning ignore the impacts of model shift, which can lead to performance deterioration due to excessive model updates. Other methods use performance difference bound to explicitly consider model shift. However, these methods rely on a fixed threshold to constrain model shift, resulting in a heavy dependence on the threshold and a lack of adaptability during the training process. In this paper, we theoretically derive an optimization objective that can unify model shift and model bias and then formulate a fine-tuning process. This process adaptively adjusts the model updates to get a performance improvement guarantee while avoiding model over-fitting.
- North America > United States > District of Columbia (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > North Carolina (0.04)
- (8 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- Overview (0.67)
- Law (1.00)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Health & Medicine > Public Health (1.00)
- (11 more...)
- Law (0.46)
- Information Technology (0.46)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.69)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.69)
- Asia > South Korea > Gyeongsangnam-do > Changwon (0.04)
- Asia > Middle East > Israel (0.04)
- Workflow (0.95)
- Research Report > New Finding (0.46)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Energy > Renewable (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)